作者:Bram de Wilde Anindo Saha Richard P. G. ten Broek Henkjan Huisman
基于扩散的文本到图像生成模型由于在效率、可访问性和质量方面的最新进步而大受欢迎。尽管使用消费级GPU对这些系统进行推理变得越来越可行,但从头开始训练它们仍然需要访问大型数据集和大量计算资源。在医学图像生成的情况下,由于法律和伦理问题,包括文本报告在内的大型可公开访问数据集的可用性受到限制。虽然在私人数据集上训练扩散模型可以解决这个问题,但对于缺乏必要计算资源的机构来说,这并不总是可行的。这项工作表明,最初在自然图像上训练的预训练稳定扩散模型可以通过训练文本嵌入和文本转换来适应各种医学成像模式。在这项研究中,我们使用仅包含100个样本的医学数据集进行了实验
Diffusion-based models for text-to-image generation have gained immensepopularity due to recent advancements in efficiency, accessibility, andquality. Although it is becoming increasingly feasible to perform inferencewith these systems using consumer-grade GPUs, training them from scratch stillrequires access to large datasets and significant computational resources. Inthe case of medical image generation, the availability of large, publiclyaccessible datasets that include text reports is limited due to legal andethical concerns. While training a diffusion model on a private dataset mayaddress this issue, it is not always feasible for institutions lacking thenecessary computational resources. This work demonstrates that pre-trainedStable Diffusion models, originally trained on natural images, can be adaptedto various medical imaging modalities by training text embeddings with textualinversion. In this study, we conducted experiments using medical datasetscomprising only 100 samples from three medical modalities. Embeddings weretrained in a matter of hours, while still retaining diagnostic relevance inimage generation. Experiments were designed to achieve several objectives.Firstly, we fine-tuned the training and inference processes of textualinversion, revealing that larger embeddings and more examples are required.Secondly, we validated our approach by demonstrating a 2\% increase in thediagnostic accuracy (AUC) for detecting prostate cancer on MRI, which is achallenging multi-modal imaging modality, from 0.78 to 0.80. Thirdly, weperformed simulations by interpolating between healthy and diseased states,combining multiple pathologies, and inpainting to show embedding flexibilityand control of disease appearance. Finally, the embeddings trained in thisstudy are small (less than 1 MB), which facilitates easy sharing of medicaldata with reduced privacy concerns.
论文链接:http://arxiv.org/pdf/2303.13430v1
更多计算机论文:http://cspaper.cn/